MayoBMI at ImageCLEF 2016 Handwritten Document Retrieval Task

نویسندگان

  • Sijia Liu
  • Yanshan Wang
  • Saeed Mehrabi
  • Dingcheng Li
  • Hongfang Liu
چکیده

In this working note, we introduce our participation at the ImageCLEF 2016 Handwritten Document Retrieval Task. We mainly focused on hyphenation detection using line images and information retrieval using n-best results. The hyphenation detection step utilizes extracted image features from beginning and end of a line and a binary classifier to determine if a line contains hyphenation. Then the spell correction step is used to eliminate spelling errors from the concatenation of a broken word from the end of a line and the beginning of the next line. The final text retrieval step employs a suffix stripping algorithm to normalize the word tense and form and TF-IDF scheme to rank the retrieved relevant segment results of our submission.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UAEMex at ImageCLEF 2016: Handwritten Retrieval

This paper describes the participation of the (UAEMex) at the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task. We propose to use a skip-character text search method based on Longest Common Subsequence. Our system split all characters in query to find all Longest Common Subsequence in one line of text.

متن کامل

CITlab ARGUS for Keyword Search in Historical Handwritten Documents - Description of CITlab's System for the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task

We describe CITlab’s recognition system for the Handwritten Scanned Document Retrieval Task 2016 attached to the CLEF 2016 hold in the city of Évora in Portugal, 5-8 September 2016 (see [9]). The task is to locate positions that match a given query – consisting of possibly more than one keyword – in a number of historical handwritten documents. The core algorithms of our system are based on mul...

متن کامل

Overview of the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task

The ImageCLEF 2016 Handwritten Scanned Document Retrieval Task was the first edition of a challenge aimed at developing retrieval systems for handwritten documents. Several novelties were introduced in comparison to other recent related evaluations, specifically: multiple word queries, finding local blocks of text, results in transition between consecutive pages, handling words broken between l...

متن کامل

General Overview of ImageCLEF at the CLEF 2016 Labs

This paper presents an overview of the ImageCLEF 2016 evaluation campaign, an event that was organized as part of the CLEF (Conference and Labs of the Evaluation Forum) labs 2016. ImageCLEF is an ongoing initiative that promotes the evaluation of technologies for annotation, indexing and retrieval for providing information access to collections of images in various usage scenarios and domains. ...

متن کامل

INAOE's participation at ImageCLEF 2016: Text Illustration Task

In this paper we describe the participation of the Language Technologies Lab of INAOE at ImageCLEF 2016 teaser 1: Text Illustration (TI). The goal of the TI task consists in finding the best image that describes a given document query. For evaluating this task, there is a dataset containing web pages having text and images. We address the TI as a purely Information Retrieval (IR) task, for a gi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016